An Algorithm for finding Noun Phrase Correspondences in Bilingual Corpora

نویسنده

Julian Kupiec

چکیده

94304 Abstract The paper describes an algorithm that employs English and French text taggers to associate noun phrases in an aligned bilingual corpus. The tag-gets provide part-of-speech categories which are used by finite-state recognizers to extract simple noun phrases for both languages. Noun phrases are then mapped to each other using an iterative re-estimation algorithm that bears similarities to the Baum-Welch algorithm which is used for training the taggers. The algorithm provides an alternative to other approaches for finding word correspondences , with the advantage that linguistic structure is incorporated. Improvements to the basic algorithm are described, which enable context to be accounted for when constructing the noun phrase mappings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BUILDING AN MT I)ICTIONARY FROM PARAI~LEI~ TEXTS BASED ON LINGUISTIC AND STATISTICAL INIi'ORMATION

A method for generating a machine translation (MT) dictionary from parallel texts is described. This method utilizes both statistical information and linguistic information to obtain corresponding words or phrases in parallel texts. By combining these two types of information, translation pairs which cannot be obtained by a linguistic-based method can be extntcted. Over 70% accurate translation...

متن کامل

Automat ic Extraction of Word Sequence Correspondences in Parallel Corpora

This paper proposes a method of finding correspondences of arbitrary length word sequences in aligned parallel corpora of Japanese and English. Translation candidates of word sequences are evaluated by a similarity measure between the sequences defined by the co-occurrence frequency and independent frequency of the word sequences. The similarity measure is an extension of Dice coefficient. An i...

متن کامل

Automatic Extraction of Word Sequence Correspondences in Parallel Corpora

متن کامل

Building An MT Dictionary From Parallel Texts Based On Linguistic And Statistical Information

متن کامل

Bilingual phrases for statistical machine translation

The statistical framework has proved to be very successful in machine translation. The main reason for this success is the existence of powerful techniques that allow to build machine translation systems automatically from available parallel corpora. Most of statistical machine translation approaches are based on single-word translation models, which do not take bilingual contextual information...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1993

An Algorithm for finding Noun Phrase Correspondences in Bilingual Corpora

نویسنده

چکیده

منابع مشابه

BUILDING AN MT I)ICTIONARY FROM PARAI~LEI~ TEXTS BASED ON LINGUISTIC AND STATISTICAL INIi'ORMATION

Automat ic Extraction of Word Sequence Correspondences in Parallel Corpora

Automatic Extraction of Word Sequence Correspondences in Parallel Corpora

Building An MT Dictionary From Parallel Texts Based On Linguistic And Statistical Information

Bilingual phrases for statistical machine translation

عنوان ژورنال:

اشتراک گذاری